Edible oil analysis system and method

ABSTRACT

The present disclosure provides a method and system for analysing one or more edible oil samples. In an embodiment the disclosure provides for calibrating the matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-MS) data obtained for one or more edible oil samples to obtain calibrated spectral data; and comparing the calibrated spectral data derived from the one or more samples against a library of calibrated MALDI-MS spectra for a plurality of edible oil samples to determine the most likely composition of the one or more edible oil samples.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a divisional application of U.S. patent applicationSer. No. 15/628,043 filed on Jun. 20, 2017, the entire content of whichis incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to a system and method for identificationof edible oils present in a sample using spectroscopic analysis,specifically matrix-assisted laser desorption/ionisation massspectrometry (MALDI-MS), and matching sample data with spectral libraryof potential candidates.

BACKGROUND OF THE INVENTION

Increasing consumer health awareness and concern as to ingredientspresent in foods consumed has meant all entities in the food supplychain are needing to demonstrate the integrity and purity of ingredientsused.

The integrity and purity of edible oils, particularly edible oils usedfor cooking, has attracted significant popular attention. In theGuangdong region of China (including Hong Kong) and also in Taiwan therehave been many cases of adulterated oils (“gutter oils”) where recycledoils have been used in place of quality edible oils.

One approach to determining authenticity of an edible oil sample is tofocus on detecting food residue markers in the samples—markers which mayinclude capsaicinoids (marker of chilli peppers), eugenol (marker ofseasonings), undecanoic acid (marker of heated vegetable oils) and13-methyl tetradecanoic acids (marker for animal oils). However,focussing on the presence or absence of these markers alone is notdeterminative of the authenticity of the edible oil sample; asunscrupulous suppliers may simply remove such markers.

Therefore, analysis using gas chromatography-flame ionisation detector(GC-FID) is the ISO-standard method to authenticate edible oils. In thismethod, the oil sample is first hydrolyzed using sodium hydroxide, andthen converted to methyl esters of fatty acids using a boron trifluoridecatalyst. The fatty acids methyl esters are then separated and detectedusing GC-FID.

The identities of the edible oil samples are then confirmed according totheir fatty acids composition. Unfortunately, these procedures arerelatively time-consuming and labour-intensive, meaning handling largeamount of samples is not possible. Other methods such as liquidchromatography (LC)-based methods also have similar scalabilityproblems.

In one effort to address the deficiencies of the above, a simpleanalytical protocol using matrix-assisted laser desorption/ionisationmass spectrometry (MALDI-MS) has been developed.

(See e.g. Ng, T. T.; So, P. K.; Zheng, B.; Yao, Z. P., Rapid screeningof mixed edible oils and gutter oils by matrix-assisted laserdesorption/ionization mass spectrometry. Anal. Chim. Acta 2015, 884,70-76.

MALDI-MS is widely used for the analysis of biological samples includinglipids, and has simple sample preparation, short analysis time, and highsensitivity without chromatographic separation. The mass spectra ofedible oils obtained from MALDI-MS are widely accepted as “fingerprints”for differentiation and authentication.

The spectrum produced by MALDI-MS is dominated by signals correspondingto diacylglycerols (DAGs)-like fragments and sodium adducts oftriacylglycerols (TAGs). Lower mass region (100-500 Da) and higher massregion (1000-2000 Da) had also been investigated but typically nosignificant signals were obtained.

Heating and mixing of edible oils changes the composition of the edibleoils, altering the TAG patterns of their MALDI-MS spectra. Prolongedheating of oils typically causes degradation of TAGs and formation ofcompounds such as diacylglycerols, monoacylglycerols, free fatty acids,oxidised TAGs and TAGs polymers. This change may be detected bycomparison with reference spectra to determine identity and/orcomposition of a sample being tested.

In one approach to processing the MALDI-MS Spectrum to determine theauthenticity/identity of a sample, principal component analysis (PCA) (aform of statistical analysis) is used to conduct a spectral comparisonof the TAG peaks in the edible oil sample being analysed.

However, manually processing a MALDI-MS sample and using PCA analysis toidentify a sample or verify the authenticity of a sample includesseveral non-trivial steps. The initial step of extraction of therelevant peaks requires manual selection by a trained operator, who hasbeen trained to know which peaks to select.

The PCA analysis process itself can also be a source oferror/complexity, because for some type of oils such as sunflower oiland grape-seed oil, their TAGs composition are similar to other oilspecies. This means several PCA steps are needed to differentiate theoils which can be very tedious.

Finally, the output of the PCA process is the projection of the inputdata onto the PCA plot, which compares the similarity between the sampleand the data in the database. However, the identity of the sample cannotbe displayed directly. From the PCA analysis, a trained operator isstill required to draw a reasonable conclusion from the result.

TAGs patterns are considered to be the “fingerprint” of the edible oils.However, some of the species shared highly similar TAGs patterns (suchas olive oil and avocado oil) which cannot be easily differentiated byPCA.

Previous approaches using PCA analysis have only considered TAG-relevantpeaks in the PCA analysis. Other peaks, such as oxidation-related andcyclic peptide-related peaks, may also be useful. However, thoseadditional peaks cannot be directly accommodated using the establishedPCA model, requiring remodelling of the PCA to fit the new information.Again, this process requires experienced and trained operators.

Unfortunately, therefore, when using PCA to analyse MALDI-MS spectra,the spectra still needs to be processed and matched manually by highlytrained personnel, which is expensive, laborious and time-consuming andlimits the capacity to scale MALDI-MS system for analysis of edibleoils.

Accordingly, there exists a significant gap between where potentialanalysis of edible oils using MALDI-MS approaches needs to be and thecurrent state at present.

It is an object of the present invention to provide a system and methodwhich at least addresses or ameliorates some of the above problems orprovides the public with an alternative choice.

SUMMARY OF THE INVENTION

Features and advantages of the disclosure will be set forth in thedescription which follows, and in part will be obvious from thedescription, or can be learned by practice of the herein disclosedprinciples. The features and advantages of the disclosure can berealised and obtained by means of the instruments and combinationsparticularly pointed out in the appended claims.

In accordance with a first aspect of the present invention, there isprovided a method for analysing one or more edible oil samples, themethod comprising:

calibrating the matrix-assisted laser desorption/ionisation massspectrometry (MALDI-MS) data obtained for one or more edible oil samplesto obtain calibrated spectral data; and

comparing the calibrated spectral data derived from the one or moresamples against a library of calibrated MALDI-MS spectra for a pluralityof edible oil samples to determine the most likely composition of theone or more edible oil samples.

The comparison between calibrated MALDI-MS data derived from the one ormore samples and a library of calibrated MALDI-MS spectra for aplurality of edible oil samples may be performed using a Cosinesimilarity test.

The most likely composition of the one or more edible oil samples may bedetermined after ranking, based on the calibrated MALDI-MS sample data,a plurality of library spectra of known edible oil types according totheir likelihood of being in the sample, from most likely to leastlikely using cosine similarity test scores; and determining the mostlikely identification of the one or more edible oil samples based uponthe highest ranked cosine similarity score.

Optionally, the cosine similarity test may conducted on one or moreregions of the sample spectral data selected from the group comprisinghigh mass, low mass and TAG regions and one or more correspondingregions of the edible oil samples in the library of calibrated MALDI-MSspectra.

Calibration may be performed by using at least one of TAG peaks and2,5-dihydroxybenzoic acid (DHB) matrix peaks as reference peaks.

Following calibration and before comparison with the calibrated librarydata, the sample data may be quantised by data binning. Data binning maybe performed by dividing the entire spectra into intervals of 0.5 m/z,averaging the intensity of all readings within each bin, and setting them/z reading as the m/z value at the middle of the interval.

Following binning, the sample data may be normalised by dividing theintensity of each bin with either the maximum intensity or the totalintensity of all bins, multiplying by an appropriate scaling parameterand rounding to the nearest integer.

The comparison between calibrated MALDI-MS data derived from the one ormore samples and a library of calibrated MALDI-MS spectra for aplurality of edible oil samples may also be performed using astatistical test selected from the group comprising characteristic peakmatching methods, partial least squares discriminant analysis (PLS-DA),or decision tree-based methods.

In a further aspect of the present disclosure there is provided anedible oil sample identification system comprising:

an input system configured to receive matrix-assisted laserdesorption/ionisation mass spectrometry (MALDI-MS) data for at least oneor more edible oil samples;

a library comprising a plurality of calibrated MALDI-MS spectral datafor a plurality of edible oil samples

a processor configured to calibrate the MALDI-MS data derived from theone or more samples and compare it with the library to determine apredicted composition of the one or more edible oil samples.

The processor may be configured to compare the calibrated spectral dataderived from the one or more samples with the library using a cosinesimilarity test, and to output the determined cosine similarity scorefor a plurality of samples of the library.

The processor may be configured to conduct the cosine similarity test onone or more regions of the sample spectral data selected from the groupcomprising high mass, low mass and TAG regions and one or morecorresponding regions of the data in the library.

Calibration of the MALDI-MS Spectral data for the one or more samplesand for the spectral data of the library may be performed by using atleast one of TAG peaks and DHB matrix peaks as reference peaks.

Following calibration and before comparison against the calibratedlibrary data, the processor may be configured to quantise the sampledata by data binning.

In a further aspect of the present disclosure there is provided a methodof generating a library of matrix-assisted laser desorption/ionisationmass spectrometry (MALDI-MS) data for a plurality of reference edibleoil samples of a plurality of different types for identifying a sampleother than the reference edible oil samples, the method comprising:

providing a plurality of reference samples of edible oil samples havinga known type;

calibrating the MALDI-MS data for each of the plurality of referenceedible oil samples by using at least one of TAG peaks and DHB matrixpeaks as referential peaks;

quantising the MALDI-MS data for the edible oil samples by data binning;

normalizing the MALDI-MS data for the edible oil samples by dividing theintensity of each bin with either the intensity of the maximum bin orthe total intensity of all bins; and

associating the normalised MALDI-MS data with the edible oil type.

The present disclosure may also include computer program product,tangibly stored on machine readable storage device, the productcomprising instructions operable to cause a processor to:

calibrate the matrix-assisted laser desorption/ionisation massspectrometry (MALDI-MS) data obtained for one more edible oil samples;and

compare the calibrated spectral data derived from the one or moresamples against a library of calibrated MALDI-MS spectra for a pluralityof edible oil samples to determine the most likely composition of theone or more edible oil samples.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and otheradvantages and features of the disclosure can be obtained, a moreparticular description of the principles briefly described above will berendered by reference to specific embodiments thereof which areillustrated in the appended drawings. Understanding that these drawingsdepict only exemplary embodiments of the disclosure and are nottherefore to be considered to be limiting of its scope, the principlesherein are described and explained with additional specificity anddetail through the use of the accompanying drawings.

Preferred embodiments of the present invention will be explained infurther detail below by way of examples and with reference to theaccompanying drawings, in which:—

FIG. 1a depicts an exemplary MALDI mass spectrum for canola oil, showingthe full spectrum from 500-1000 Da;

FIG. 1b is an enlarged view of a TAG region of FIG. 1 a.

FIG. 1c shows the characteristic peaks of canola, peanut, olive andsunflower edible oils, focussing on the TAG region of these oils.

FIGS. 2a-2f show exemplary MALDI-MS spectra of various recycled oils.

FIGS. 3a-3e depict exemplary MALDI-MS spectrum for mixture of canola andolive oils at various concentrations.

FIG. 4a depicts an exemplary PCA plot of different types of oil in whichthe oils can be divided into 4 groups (the same results obtained byhierarchical clustering).

FIG. 4b depicts an exemplary subsequent PCA analysis of group 2 of theplot of FIG. 4a in which a further PCA is conducted to further identifydifferent types of oil.

FIG. 5a depicts an exemplary flow diagram an embodiment of the system ofthe present disclosure.

FIG. 5b depicts the core of the flow diagram of FIG. 5a and how theworkflow proceeds.

FIG. 6a shows the exemplary data of characteristic spectra stored for aknown sample of a library database, depicted graphically to the user.

FIG. 6b shows an exemplary representation of an enlarged portion of thespectral region for the sample of FIG. 6 a.

FIG. 7a depicts a strong correlation between an unknown sample of a pureoil and the reference sample for Palm oil stored in the library.

FIG. 7b depicts the results of an analysis of an unknown sample of anadulterated oil and the low correlation score with either Peanut orolive spectra in the reference sample.

FIG. 8a is an exemplary sample of a bad TAG spectrum of sunflower oilwhich would not be stored in the database.

FIG. 8b is an exemplary sample of a good TAG spectrum of sunflower oilwhich would be stored in the database as a reference sample.

FIG. 9a is an exemplary sample of a mass spectrum analysis of a corn oilin which calibration has not been conducted properly.

FIG. 9b shows the same sample of a corn oil after calibration has beenconducted properly

FIG. 9c shows the reference sample of TAG spectrum for corn oil forreference.

FIG. 9d shows another sample in which the TAG spectrum of camellia oilis poorly calibrated and has poor resolution.

FIG. 9e shows the sample of FIG. 9d in which the resolution andcalibration has been rectified.

FIGS. 10a (i)-10 a(iii) show an exemplary sample mislabelled as Flaxseedoil.

FIG. 10b shows the GC-FID analysis of the sample which supports theconclusion this sample has been mislabelled.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Various embodiments of the disclosure are discussed in detail below.While specific implementations are discussed, it should be understoodthat this is done for illustration purposes only. A person skilled inthe relevant art will recognize that other components and configurationsmay be used without departing from the scope of the present disclosure.

The disclosed technology addresses the need in the art for a scalable,reliable technique for analysis of edible oils.

Typical MALDI Protocol for Analysis of Samples (Library and UnknownSamples)

Typical MALDI-MS protocol involved mixing of matrix solution and samplesolution, which is subsequently allowed to dry onto the MALDI plate.After the formation of sample and matrix crystal, the MALDI plate isinserted into the MALDI-MS instrument for analysis.

No chromatographic separation is involved in MALDI-MS analysis, thusallowing rapid analysis of edible oil samples.

In the exemplary MALDI-MS protocol, edible oil samples could be loadeddirectly onto the MALDI plate pre-deposited with matrix layer for theMALDI-MS analysis. Sample loading for one sample could be finishedwithin several seconds and around three hundred samples could be loadedonto the same MALDI plate.

In a specific steps of sample loading according to an exemplary approachused aliquots of 1 μL of 100 mg mL1 DHB in acetone were loaded ontospots of the MADLI plate and air-dried to form matrix layers. About 0.2μL of each oil sample was then transferred by pipette tip or cotton tipto form a thin oil layer on the matrix layer.

The plate was then introduced into the mass spectrometer for MALDI-MSanalysis.

An Ultraflex Xtreme MALDI-TOF/TOF mass spectrometer (Bruker Daltonics,Germany) was used for the analysis. The laser of the MALDI source was aSmart beam-II (Nd:YAG, 355 nm) pulse laser operating at a frequency of2000 Hz. The mass spectrometer was operated in positive reflectron mode.The settings of positive reflectron mode for the ion source 1, source 2,lens, reflector 1, reflector 2, and pulsed ion extraction were 20.00 kV,17.75 kV, 7.00 kV, 21.10 kV, 10.85 kV, and 140 ns, respectively. Thesample rate and digitizer were set to 5.00 GS/s. Extended mass rangeswere employed if necessary.

The mass spectrometer was calibrated with a PEG solution mixture(PEG600/PEG1000/PEG2000/Nal=1/2/2/5 (v/v)).

The spectral acquisition was performed using the flexControl 3.4 (BrukerDaltonics, Germany) program. The mass spectra were analysed usingflexAnalysis 3.4 (Bruker Daltonics, Germany) program. The Centroidalgorithm was used for peak detection.

Referring to the FIGS. 1a-1c there are shown typical spectra resultingfrom MALDI-MS analysis of selected edible oil samples.

Referring now to FIG. 1 a, the results of the full spectra resultingfrom MALDI-MS spectrum of a canola oil sample from 500-1200 m/z isshown. There can be seem a dominant triacylglycerol peak at 907.8 Da,together with a smaller peak at 881.8 Da. There are also a number ofpeaks in the fragment region clustered around 603.6 Da. The peaks behind920 Da are typical of oxidation/thermal products.

Peaks clustered around 935.8 Da can be oxidised TAGs and peak at 1059.8Da seem correlated with TAGs-fragment cluster ion.

In FIG. 1 b, which is an enlarged portion of the region of FIG. 1a from850-920 m/z, the same two TAG peaks can be seen, with the lettersrepresent various fatty acids, including P (Palmitic Acid), O (OleicAcid), L (Linoleic Acid) and Ln (Linolenic acid).

Finally, FIG. 1c depicts peaks of MALDI MS spectra for a number of otheredible oil samples of different types including (i) canola (ii) peanut(iii) olive (iv) sunflower with their characteristic TAG peaks (and to alesser extent their DAGs-like peaks).

FIG. 2 depicts TAG peaks of various samples of recycled oils, which wereobtained from National Analytical Centre, Guangzhou and which the TAGspatterns appear quite different from pure edible oils of the database.

FIGS. 3a-3d depict typical MADLI-MS spectra which have been obtained byanalysis of various mixtures of olive and canola oil. Particularemphasis is given to the changes in the characteristic peaks as therelative proportions of the components of the mixture are changed. Itcan be seen that there is a characteristic TAG peak at 908 Da, withsecondary peaks at 882 Da being distinctive indicators for the presentof canola oil in FIG. 3a . As shown at FIG. 3a -3 e, when thecomposition of olive oil increases, the intensity of peak at 882 Daincreases. The changes in the spectra allowed differentiation of variousmixtures of olive and canola oil.

Referring now to FIG. 4a , there is shown the output from the prior artapproach to sample identification using PCA, once the peaks such asthose of FIGS. 1, 2 and 3 have been obtained for a sample.

PCA converts observations of possibly correlated variables into a set ofvalues of linearly uncorrelated variables using an orthogonal lineartransform. This technique allows for visualising and processing of highdimensional datasets but at the same time retaining as much of thevariance of the dataset as possible.

In conducting the principal component analysis of the sample, a scoreplot is generated from first and second principal TAG components of thesample.

Results of using PCA on TAG components showed that samples from the samespecies were clustered individually and different vegetable oil speciescould be clearly differentiated from each other. (see FIG. 1 below andNg, T. T.; So, P. K.; Zheng, B.; Yao, Z. P., Rapid screening of mixededible oils and gutter oils by matrix-assisted laserdesorption/ionization mass spectrometry. Anal. Chim. Acta 2015, 884,70-76.)

In FIG. 4a , the results of MALDI-MS on the samples can be divided intoessentially four groups, as indicated by the outlines.

Group 1 (10) is likely to be Peanut oil, Group 2 (12) flaxseed, Group 3(14) vegetable oils with TAGs patterns similar to olive oil and Group 4(16) other vegetable oils.

Referring to FIG. 4b , there can be seen the results of a further PCAanalysis in which sub group 2 is further analysed. Samples ofcanola/rapeseed oil, sesame oil, rice bran oil and cotton-seed oil(spots located at the lower half) are likely to be differentiated fromthe other oils (spots located at the upper half) in sub group 2. Theother oils in sub group 2 will need further PCA analysis to bedifferentiated completely.

However, as noted in the background to the present invention, PCAanalysis suffers from a number of deficiencies which mean that it is notscalable, and must be performed by skilled operators.

Accordingly, the present disclosure provides a method and system whichaddresses these deficiencies, and which enables a robust, scalabletechnique for analysis, verification and identification of edible oils.

Referring to FIG. 5a , there can be seen the three main groups of stepsin the method and system of the present disclosure.

Referring to component 50, there is disclosed a series of stepsassociated with producing a library of MALDI-MS spectral data. Aplurality of edible oil samples having a known type and origin areselected for assessment at step 52. In the library produced, we haveselected up to six hundred samples from various suppliers, includingfrom Mainland China, Taiwan, Hong Kong, and Sigma-Aldrich in the USA.Multiple instances of one type oil from different sources have beenselected in many cases to provide a complete database.

These samples are subjected to a MALDI-MS analysis as disclosed at theprotocols of the present disclosure in step 54.

The MALDI-MS spectra obtained from the analysis at step 56 are passedthrough an optional quality assurance review at step 58 involving reviewby a trained operator. During this review, the operator reviews thecalibration and resolution of the spectrum in order to ensure that thebest reference spectra are included in the library. A high level ofoxidised products and/or poor calibration will cause the data to berejected by the human reviewer. It would be appreciated that althoughthis step is optional, particularly in the generation of a library ofspectral data, review of the spectral data to ensure that readilyapparent errors or tainted samples are rejected which in turn increasesthe integrity of the library of spectral data obtained.

Following the quality assurance review, the MALDI-MS spectra are storedin the spectral database at step 59, thereby forming a library ofspectral data of edible oil samples. Optionally, in order to increasethe accuracy of the presumptive identification made using the samples inthe library, a plurality of different samples for a known oil type maybe obtained from a number of different manufacturers, it is anticipatedthat notwithstanding the origin of the oil being from a number ofdifferent manufacturers that a fairly similar spectra will be observed.

A further part of the system of the present disclosure includes thesample analysis system referred to at box 60 of FIG. 5a . It would beappreciated that the sample analysis allows for the detection ofadulterated oils (62 a) oils (cheaper oils mixed with expensive oils andmislabelled as pure, often more expensive oils) and “gutter”—oils whichhave been used already for cooking and have been recycled—as well as theidentification of pure edible oils from an unknown sample or mixtures ofedible oils from an unknown sample (62 b).

The unknown samples are subjected to MALDI-MS analysis at step 64 inaccordance with the earlier described protocol of the present disclosureto produce the sample MALDI-MS spectra at step 66.

Referring now to the identification component of the present disclosure70, the series of steps in an exemplary identification system isdescribed in overview.

The sample spectra 66 are matched at the algorithmic matching step 72 bya processor of a computer. The algorithmic matching may be a form ofcomparison such as a cosine similarity test or similar by which thesample data 66 is compared against the spectral database 59 which hasbeen obtained for a plurality of edible oil samples in the librarygeneration component 50. This process is disclosed in more detail withreference to FIG. 5 b.

The outcome of the algorithmic matching between the sample spectra 66and the spectral database 59 is one or more similarity scores accordingto the edible oil samples present in the library.

As is appreciated by a person skilled in the art, the higher thesimilarity score produced by whatever algorithm is used for matching thespectra of the sample with the corresponding spectra in the edible oillibrary, the higher the chance of presumptive identification of theunknown edible oil sample.

It would be appreciated that this process is similar, notwithstandingwhether the unknown edible oil sample is an adulterated edible oilsample, a pure edible oil sample or a mixture of edible oils—thematching with the library via the algorithm and resulting presumptiveidentification will be generally the same.

Referring now to FIG. 5b , the process for the processing of the spectraof an unknown sample is disclosed in more detail. At 66 the usersupplies the raw MALDI-MS spectra for analysis. It would be appreciatedthat this could be supplied as a data file to a central processingfacility which is geographically remote from the place at which theMALDI-MS spectra was performed. Alternatively, the spectra and theprocessing could be conducted at the same facility, without detractingfrom the present disclosure.

Once the raw MALDI-MS spectra have been obtained from user input, thisspectra is calibrated at step 72 a. Calibration is typically conductedusing prominent TAG peaks and the DHB matrix peaks as a reference. Thiscalibration process ensures that the samples are standardized, for easeof analysis and reproducibility of analysis against the library ofstandardized edible oil spectra peaks.

Alternative matrixes could also be used such as CHCA(α-Cyano-4-hydroxycinnamic acid) and SA (sinapinic acid). Based uponwhichever matrix was used appropriate calibration with thecharacteristic peaks could also be performed. However, it was noted thatthe background noise were higher if CHCA were used, and the signalintensity were very poor if sinapinic acid was used as the matrix.

To assist in obtaining reproducible results, as a matter of bestlaboratory practice the mass spectrometer should also be calibrated witha PEG solution mixture (PEG600/PEG1000/PEG2000/Nal=1/2/2/5 (v/v)) beforeconducting analysis.

Following calibration, a binning process is conducted on the data atstep 72 b. As is known to persons skilled in the art, data binning orbucketing is a data pre-processing technique which quantizes the data.Basically, in data binning, the original data series values which fallin a given small window or bin are replaced by a value which is arepresentative of that interval, often the centre value.

As is known in the art, data binning reduces the amount of data(necessarily losing information) but facilitating analysis. In theMALDI-MS spectra analysis, the typical size of bin was 0.5 m/z, althoughother sizes could be used. It would be appreciated by a person skilledin the art that an increased bin size will decrease the resolution ofthe data obtained. The size of bin affects the accuracy and quality ofthe matching, and the optimal size of the bin represents a balancebetween too much detail and too low resolution.

In an exemplary embodiment of the present invention the data binningprocess used was dividing the entire spectra into intervals of 0.5 m/z,averaging the intensity of all readings within each bin, and setting them/z reading as the m/z value at the middle of the interval.

Following the data binning process at step 72 b, the data was normalizedby dividing the intensity of each bin with the maximum intensity of allbins, multiplied with 10000 and rounded to the nearest integer.

Alternatively, in an alternative process of normalization, the sampledata could be normalised by dividing the intensity of each bin with thetotal intensity of all bins, and then multiplying the result by 10000and rounding.

Following the normalization process, the unknown sample data which hasbeen calibrated, binned, and normalized, is compared against a databaseof reference spectra of edible oils (where the information in thatdatabase is for edible oils which have been similarly calibrated, binnedand normalized) in step 74. This comparison may be conducted using acosine similarity approach.

As is known in the art, cosine similarity matching is a measure ofsimilarity in which the samples of unknown sample spectra is representedas a vector; and for which the dot product of that sample with aplurality of vectors representing the sample data in the library isobtained.

Cosine similarity calculates the cosine of the angle between twovectors.

Where vectors are in generally the same direction, the cosine of theangle between the two vectors is near to 100% or 1. However, where thevectors (abstractions of the spectral data) are orientated in differentdirections, the cosine of the angle between the two vectors which isobtained by the dot product solution is near zero; that is 0%.Furthermore, where the vectors are in completely different directions,the cosine similarity obtained is −1.

Accordingly, the cosine similarity is a useful algorithm to obtain anumerical score which represents the degree of similarity between twospectra.

Optionally, other data processing technique such as characteristic peakmatching methods (mainly for detecting the presence or absence ofoxidation products or cyclopeptides), partial least squares discriminantanalysis (PLS-DA), or decision-tree based techniques (e.g. randomforest) could also be used to compare the calibrated, binned andnormalized MALDI-MS sample data with the similarly calibrated, binnedand normalized data of the reference spectra.

The outcome of the process at step 76 is the identification of the oil.

Advantageously, as is depicted in successive Figures, it may be providedin the form of a ranked series of scores of cosine similarity for aplurality of samples of the library, or alternatively may be simplyselected as the highest score identification.

Referring now to FIG. 6a , there can be seen an exemplary referencespectra for an edible oil sample S88 (canola). The prominent TAG peakregion 80 is circled in FIG. 6a for ease of reference and furtherextracted in FIG. 6b in an enlarged view.

On the left hand side of the sample, it can be seen that there are aplurality of samples of canola oil 82 in which the details of themanufacturer and collection source have been recorded at 84.

There is also optionally the ability to view the raw numerical data intext format by selection of the link 86.

Most edible oils have one group of TAG peaks at around 870-885 Da, andanother group of TAG peaks at 900-910 Da. Some edible oils such ascoconut oil would have their TAG peaks at different region. For eachtype of oil, the ratio of each TAG is largely determined by the enzymesof the parent species. Therefore, the relative intensities of peakswithin the TAG region are specific for each types of oil, and theintensities of the peaks form a distinctive shape for each types of oil.Hence, the location and shape of the TAG regions could be used as afingerprint to identify the oil type of an unknown.

Accordingly, the present disclosure provides the ability for users toview, for example, by browsing the reference spectra of multipledifferent types of edible oils, and in many cases, for multiple samplesof a particular edible oil.

This sample reference data may be stored as a series of raw numericaldata, but represented for ease of human interpretation in the graphicalformat depicted.

Referring now to FIG. 7a , the output from the authentication process ofan unknown oil sample can be seen. Advantageously, the file may beincluded at the portion of the screen indicated by 88 and the outputindicated by the correlation score shown in the region below at 90. Inthis example, the unknown oil spec in text format has been presumptivelyidentified as a palm oil with a relatively strong correlation score of0.9959.

Referring now to FIG. 7b , it can be seen that there are two sampleswhich have been provided at the outputs presented, at 92 and 94respectively.

For the sample identified at 92 as butter, the correlation score isrelatively lower (0.9410) meaning that there is not as much confidencein the presumptive identification.

Similarly at 94, the edible oil identified is pumpkin seed oil, howeverthe correlation score is also relatively low.

(Note: The ‘low score’ may depend on the usage. If the user just wantsto identify an unknown with no other information, the user may besatisfied with a score of 0.97, as it would be the closest match.

However, if the user wants to know if an oil with known type has beenadulterated or heated for some time, the threshold for correct matchingneeds to be higher, as a lower score would mean that the TAGs in the oilhas been changed somehow.

Generally speaking, based upon the experimental data obtained to date athreshold score of 0.97 seems to be a reasonable level for mostpurposes.)

These results may be compared to the high correlation score shown inFIG. 7a for palm oil.

Accordingly, the user reviewing these results in view of the weakcorrelation scores would be less confident of the presumptiveidentifications of the latter samples.

Referring now to FIG. 8a , there can be seen exemplary spectra of poorquality. As represented by peaks at 917.7 and 933.7 (100 and 102respectively), there is a high level of oxidation products present inthe sample. Accordingly, with human quality assurance step of FIG. 5a(58), this sample would not be included in establishing the spectraldatabase

Alternatively, if these samples were obtained from an unknown sample thesystem would generate a poor correlation score against the oil spectrain the database, and the user would know that the oil has been somehowmodified (adulterated, heated, stored for too long, etc.).

Referring now to FIG. 8b , there can be seen a much better spectrum of anormal sunflower oil sample in which the oxidation products 100, 102 areclearly not present.

Similarly, referring now to FIG. 9a , there is depicted a poorlycalibrated spectrum for corn oil, in which the typical TAG peaksassociated with corn oil have been shifted 0.3 Da to the right.

This is apparent when the spectrum or the poorly calibrated spectrum ofFIG. 9a is compared to the reference spectrum of FIG. 9b , the TAG peaksare 110 and 112 are clearly displaced and accordingly the spectra ofFIG. 9a after calibration is adjusted by 0.3 Da to provide the properlycalibrated spectra as depicted in FIG. 9 c.

Another example of poor calibration and resolution can be seen in FIG.9d in which the TAG pattern of camellia oil is depicted. It can be seenlooking at the characteristic peaks 114 and 116 for this camellia oilsample that again the sample has been poorly calibrated, and suffersfrom poor resolution in the region 118 between 900 and 910. The peaksobtained should be sharp and the baseline should be low. Characteristicpeaks in FIG. 9d seen too broad and the baseline is raised.

Accordingly, following review and quality assurance being performed byan operator, as depicted in FIG. 9e , the re-analysis of the TAGcamellia oil sample gives a much better more robust spectra, in whichthe calibration and resolution problems have been addressed.

Referring to FIG. 10a , there is a further example of the necessity forquality assurance to be conducted on the samples which form thereference library.

It can be seen that the problematic sample of spectra in 10 a(i) has anentirely different peak pattern of the flaxseed oil samples depicted in10 a(ii) and 10 a(iii). Following the identification of the mis-match,between the claimed reference sample of flaxseed oil and the otherreference samples in the reference library, the problematic sample couldbe subjected to a GC-FID analysis. As depicted in FIG. 10b , the outcomeof such analysis reveals that the sample has indeed been mislabelled andthe sample would no longer be included in the reference database asbeing flaxseed oil.

Accordingly, the integrity of the reference library for identificationof the edible oil samples can be increased.

The present invention provides an advantageous, potentially scalablemethod of identifying edible oils. This enables the rapid detection ofmislabelled edible oils, the identification of adulterated oils andgutter oils, as well as the ability to authenticate labelled oilspectra.

It is also possible for the major and minor elements of the mixed oil tobe identified, through comparison with reference samples such as thosedepicted in FIGS. 3a to 3e in the reference library.

In this usage, the user can check the relative proportions claimed onthe label of the edible oil with the actual detected compositions.

Another advantage of the present invention is that the MALDI-MS analysiscan be conducted at an analytical laboratory and the reference librarymay necessarily be located at a location which is geographically remotefrom that analytical laboratory. The presumptive identification can thentake place over the internet, with the data simply uploaded onto anappropriate website.

The analysis and presumptive identification of the edible oils can becarried out by as a routine laboratory procedure, without the need forongoing training or exhaustive statistical analysis.

The algorithm used for matching the spectra provides a reliable,scalable and efficient way of presumptively identifying a wide varietyof edible oils against a reference library.

The inclusion of the automated data matching process removes the needfor human matching of the reference spectra.

Unlike previous PCA approach, the algorithm does not need to be modifiedif a new type of oil is added to the database.

The results can also be displayed to the user automatically, which isnot possible with the previous approaches.

The above embodiments are described by way of example only. Manyvariations are possible without departing from the scope of theinvention as defined in the appended claims.

For clarity of explanation, in some instances the present technology maybe presented as including individual functional blocks includingfunctional blocks comprising devices, device components, steps orroutines in a method embodied in software, or combinations of hardwareand software.

Methods according to the above-described examples can be implementedusing computer-executable processes that are stored or otherwiseavailable from computer readable media. Such processes can comprise, forexample, instructions and data which cause or otherwise configure ageneral purpose computer, special purpose computer, or special purposeprocessing device to perform a certain function or group of functions.Portions of computer resources used can be accessible over a network.The computer executable instructions may be, for example, binaries,intermediate format instructions such as assembly language, firmware, orsource code. Examples of computer-readable media that may be used tostore instructions, information used, and/or information created duringmethods according to described examples include magnetic or opticaldisks, flash memory, universal serial bus (USB) devices provided withnon-volatile memory, networked storage devices, and so on.

Devices implementing methods according to these disclosures can comprisehardware, firmware and/or software, and can take any of a variety ofform factors. Typical examples of such form factors include laptops,smart phones, small form factor personal computers, and so on.Functionality described herein also can be embodied in peripherals oradd-in cards. Such functionality can also be implemented on a circuitboard among different chips or different processes executing in a singledevice, by way of further example.

The instructions, media for conveying such instructions, computingresources for executing them, and other structures for supporting suchcomputing resources are means for providing the functions described inthese disclosures.

Although a variety of examples and other information was used to explainaspects within the scope of the appended claims, no limitation of theclaims should be implied based on particular features or arrangements insuch examples, as one of ordinary shill would be able to use theseexamples to derive a wide variety of implementations.

Further and although some subject matter may have been described inlanguage specific to examples of structural features and/or methodsteps, it is to be understood that the subject matter defined in theappended claims is not necessarily limited to these described featuresor acts. For example, such functionality can be distributed differentlyor performed in components other than those identified herein. Rather,the described features and steps are disclosed as examples of componentsof systems and methods within the scope of the appended claims.

1. A method for analysing one or more edible oil samples, the methodcomprising: receiving, by a processor, matrix-assisted laserdesorption/ionisation mass spectrometry (MALDI-MS) data for one or moreedible oil samples, calibrating, by the processor, the MALDI-MS data forthe one or more edible oil samples to obtain calibrated MALDI-MS data byusing reference peaks selected from triacylglycerol (TAG) peak(s) and2,5-dihydroxybenzoic acid (DHB) matrix peak(s) of the MALDI-MS data ofthe one or more edible oil samples; comparing, by the processor, thecalibrated MALDI-MS data derived from the one or more samples against alibrary comprising calibrated MALDI-MS data for a plurality of edibleoil samples, wherein the library of calibrated MALDI-MS data is obtainedby calibrating reference peaks selected from triacylglycerol (TAG)peak(s) and 2,5-dihydroxybenzoic acid (DHB) matrix peak(s) of theMALDI-MS data for the plurality of edible oil samples, wherein suchcomparison does not include principal component analysis (PCA) analysis;and determining a most likely composition of the one or more edible oilsamples.
 2. The method for analysing one or more edible oil samplesaccording to claim 1, wherein the comparison between the calibratedMALDI-MS data derived from the one or more samples and the library ofcalibrated MALDI-MS data for the plurality of edible oil samples isperformed using a Cosine similarity test.
 3. The method for analysingone or more edible oil samples according to claim 1, wherein the mostlikely composition of the one or more edible oil samples is determinedafter ranking, based on the calibrated MALDI-MS data, a plurality ofcalibrated MALDI-MS data of known edible oil types in the libraryaccording to their likelihood of being in the one or more edible oilsamples, from most likely to least likely using cosine similarity testscores; and determining the most likely identification of the one ormore edible oil samples based upon the highest ranked cosine similarityscore.
 4. The method for analysing one or more edible oil samplesaccording to claim 2, wherein the cosine similarity test is conducted onone or more regions of the calibrated MALDI-MS data derived from the oneor more samples selected from the group comprising high mass, low massand TAG regions and one or more corresponding regions of the edible oilsamples in the library of calibrated MALDI-MS data.
 5. The method foranalysing one or more edible oil samples according to claim 1, whereinfollowing calibration and before comparison with the library ofcalibrated MALDI-MS data, the calibrated MALDI-MS data derived from theone or more samples is quantised by data binning.
 6. The method foranalysing one or more edible oil samples according to claim 6, whereinthe data binning is performed by dividing an entire spectra intointervals of 0.5 m/z, averaging intensity of all readings within eachbin, and setting an m/z reading as a m/z value at the middle of theinterval.
 7. The method for analysing one or more edible oil samplesaccording to claim 6, wherein following binning, the calibrated MALDI-MSdata derived from the one or more samples is normalised by dividing anintensity of each bin with either a maximum intensity or a totalintensity of all bins, multiplying by an appropriate scaling parameterand rounding to a nearest integer.
 8. The method for analysing one ormore edible oil samples according to claim 1, wherein the comparisonbetween the calibrated MALDI-MS data derived from the one or moresamples and the library of calibrated MALDI-MS data for the plurality ofedible oil samples is performed using a statistical test selected fromthe group consisting of characteristic peak matching methods, partialleast squares discriminant analysis (PLS-DA), and decision tree-basedmethods.